Module/igv/1.0 #303

mannycruz · 2024-03-27T20:30:39Z

Pull Request Checklists

Important: When opening a pull request, keep only the applicable checklist and delete all other sections.

Checklist for New Module

Required

If applicable

I added more granular output subdirectories.
I added rules to the reference_files workflow to generate any new reference files.
I added subdirectories with large intermediate files to the list of scratch_subdirectories in the default.yaml configuration file.
I updated the list of available wildcards for the input files in the default.yaml configuration file.

Checklist for Updated Module

Important! If you are updating the module version, ensure the previous version of the module is restored from master.
If you want to restore a deleted file or directory from the remote master, you can use git checkout origin/master path/to/file,
then a git commit will ensure that file is tracked on your branch again.
Example:

mv modules/strelka/1.1 modules/strelka/1.2
git checkout origin/master modules/strelka/1.1

… alleles in one position are incorporated into the batch scripts

…(expanded genomic coordinate output)

…ol_names"]

…/igv/1.0

mannycruz · 2024-03-27T20:31:33Z

modules/igv/1.0/config/default.yaml

+    igv:
+
+        inputs:
+            # Available wildcards: {unix_group} {seq_type} {tumour_sample_id} {normal_sample_id} {pair_status} {genome_build}


Is it okay to have {unix_group} as a wildcard?

It is not a standard column in the lcr-modules schema and is something specific to gambl. There are ways to deal with this - can show some examples and we can talk about this more before lab meeting today 😄

Kdreval

Thanks!!
I am mostly wondering if we can reuse existing conda envs and scripts to have more standard approach and decrease the maintenance burden 😄
This looks great, long way since the early version! 🚀

Kdreval · 2024-03-28T18:04:41Z

modules/igv/1.0/config/default.yaml

+            maf: "__UPDATE__"
+
+        regions: 
+            # Provide regions files as lists in their respective genome builds so that liftover of coordinates occurs properly


What happens if nothing is specified here? Can we add the "__UPDATE__" to anything that has to be filled in?

I added an "__UPDATE__" string and specified that at least one regions file must be provided

Kdreval · 2024-03-28T18:09:46Z

modules/igv/1.0/config/default.yaml

+            maf:
+                grch37: []
+                hg38: []
+            mutation_id:


Can we add here the example formatting of what is expected in that file? Does it have to have a header and expects certain column names?

Done ! I added what column and format is required for mutation_id file format

Kdreval · 2024-03-28T18:11:21Z

modules/igv/1.0/config/default.yaml

+        options:
+            genome_map: 
+                # Map metadata builds to grch37 and hg38 so that MAF file locations are determined correctly
+                grch37: ["__UPDATE__"] # e.g ["grch37","hg19","hs37d5"]


you sort of know this upfront, so maybe it is better to fill in these keys and then the comment will be that you can add other genome builds to this list?

Kdreval · 2024-03-28T18:13:33Z

modules/igv/1.0/config/default.yaml

+
+            liftover_regions:
+                reference_chain_file:
+                    grch37: "genomes/grch37/chains/grch37/hg19ToHg38.over.chain"


I think this is generated with the reference files so you can directly request the reference files output instead of having this listed here. This can be an example:

lcr-modules/modules/liftover/2.0/liftover.smk

Lines 108 to 130 in 7b495a3

def get_chain(wildcards):

if "38" in str({wildcards.genome_build}):

return reference_files("genomes/{genome_build}/chains/grch38/hg38ToHg19.over.chain")

else:

return reference_files("genomes/{genome_build}/chains/grch37/hg19ToHg38.over.chain")

# Convert the bed file in hg38 coordinates into hg19 coordinates

rule _run_liftover:

input:

native = rules._liftover_convert_2_bed.output.bed,

chains = get_chain

output:

lifted = temp(CFG["dirs"]["liftover"] + "from--{seq_type}--{genome_build}/{tumour_id}--{normal_id}--{pair_status}.{tool}--{type}.lifted_{chain}.bed"),

unmapped = CFG["dirs"]["liftover"] + "from--{seq_type}--{genome_build}/{tumour_id}--{normal_id}--{pair_status}.{tool}--{type}.lifted_{chain}.unmapped.bed"

log:

stderr = CFG["logs"]["liftover"] + "from--{seq_type}--{genome_build}/{tumour_id}--{normal_id}--{pair_status}.{tool}--{type}.lifted_{chain}.stderr.log"

params:

mismatch = CFG["options"]["min_mismatch"]

conda:

CFG["conda_envs"]["liftover-366"]

wildcard_constraints:

chain = "hg38ToHg19|hg19ToHg38"

Kdreval · 2024-03-28T18:16:40Z

modules/igv/1.0/config/default.yaml

+                run_unpaired_tumours_with: "unmatched_normal"
+                run_paired_tumours_as_unpaired: False
+
+    slms_3:


can you instead just include config from slms-3?
I think this will conflict if you have snakefile that will attempt to run both slms-3 and this module because snakemake won't like duplicated config values - or will just use whatever is imported last so may have unexpected consequence in that scenario

Yes good idea, I removed this part from the config

Kdreval · 2024-03-28T18:31:27Z

modules/igv/1.0/igv.smk

+        bam = CFG["dirs"]["inputs"] + "bams/{seq_type}/{sample_id}.bam",
+        bai = CFG["dirs"]["inputs"] + "bams/{seq_type}/{sample_id}.bam.bai"
+    run:
+        op.absolute_symlink(input.bam, output.bam)


Curious if this needs crai index whe the bam is a cram 😃

Kdreval · 2024-03-28T18:35:48Z

modules/igv/1.0/igv.smk

+    input:
+        maf = get_maf
+    output:
+        maf = CFG["dirs"]["inputs"] + "maf/{seq_type}--{genome_build}/{tumour_id}--{normal_sample_id}--{pair_status}.maf"


I think this might be missing the sample in the tumour_id wildcard?
Here is the part from config:

# Available wildcards: {unix_group} {seq_type} {tumour_sample_id} {normal_sample_id} {pair_status} {genome_build}

Kdreval · 2024-03-28T18:41:44Z

modules/igv/1.0/igv.smk

+    script:
+        config["lcr-modules"]["igv"]["scripts"]["format_regions"]
+
+REGIONS_FORMAT = {


Why do we need a dictionary if everything is mapped to the same key bed? Maybe we can just refer to bed regardless of what is the region format?

You're right- I removed it

Kdreval · 2024-03-28T18:44:48Z

modules/igv/1.0/igv.smk

+    script:
+        config["lcr-modules"]["igv"]["scripts"]["filter_script"]
+
+def _get_maf(wildcards):


Curious why this does not conflict with the function defined above?

def get_maf(wildcards): unix_group = config["unix_group"] return expand(config["lcr-modules"]["igv"]["inputs"]["maf"], allow_missing=True, unix_group=unix_group)

Just because it is defined later the further call will use the "latest" definition?

I removed the earlier function because I do not need it to fill in unix_group value (this can be done from the snakefile that is used to launch the run), so not an issue anymore

Kdreval · 2024-03-28T18:48:32Z

modules/igv/1.0/igv.smk

+
+rule _igv_download_igv:
+    output:
+        igv_zip = CFG["dirs"]["igv"] + "IGV_2.7.2.zip",


What if someone doesn't run this on linux or wants to have a different version? Maybe this part should be configurable or better even use the conda instead of downloading the source file?
https://anaconda.org/bioconda/igv

…/igv/1.0

mannycruz · 2024-08-26T19:30:08Z

I have addressed most of the comments above, but I need to further investigate regarding:

if crams can be used for snapshots Module/igv/1.0 #303 (comment)
downloading IGV vs using conda environment Module/igv/1.0 #303 (comment)

plus i have made added some more changes since, so I will ask for an updated review once I'm finished

… snakefile

mannycruz added 30 commits January 31, 2023 02:17

initial IGV module, specify regions using MAF

bcf9916

Add rules to reformat input regions files

f2091a4

Add script to perform regions reformatting

f6980b9

Add script to filter maf based on BED or MAF

e54b85c

Modify liftover script to accomodate BED regions

f46ff3f

Update config for changes made to module

d0693a6

Remove commented out text

8fa8bab

Add function to reformat hotmaps MAF results

d3b8e1c

Add function to format mutation_id regions file

3791dd1

Move constraint on n snaps/variant to filter step

e0c184d

Remove metadata file option, fix pairing config

07bea16

Grammar

ed501fb

Convert from snakemake shell to script directive

2bb98e7

Add sample_id tracking, metadata = CFG["samples"]

ff248e6

Overhaul to sample_id-dependent workflow

e389982

Add capability for VCF files as regions

31cba38

Allow filtered MAF files to be temp

77f9d76

Track what snapshots will be created (draft)

0bf93ad

Workflow changed to run per sample-variant combo

f0047d5

Merge variant batch scripts to prevent IGV crash

a9a27fe

Remove conda environment in filter_maf rule

28c54f1

Add symlinked snapshots to workflow targets

4318b8d

Clean up comment lines

7805531

Remove "exit" line from position batch scripts

0f4c924

Fix variable referenced before assignment error

383ff17

Add log outputs to igv run

13d6992

Add dependency to regions files to checkpoint rule

c9a8abc

Set thread limits on batch creation and IGV run

6bb0861

Clean up format_regions script

351019f

Clean up subdirectories

9764101

mannycruz added 13 commits February 10, 2024 06:37

Reorder config values to make it more understandable (i hope)

fea42ec

Group variants by position so that snapshot instructions for multiple…

b0a69bf

… alleles in one position are incorporated into the batch scripts

Update local rules

50bfa6c

Update oncodriveclustl results reformatting using new module outputs …

746b073

…(expanded genomic coordinate output)

Add blank line to the end of scripts

7c536f1

Update CHANGELOG

27363e9

Add slms 3 pairing config so that slms_3 can be set in the config["to…

24ea395

…ol_names"]

Add more info to default config

0ee08cb

Merge branch 'master' of github.com:LCR-BCCRC/lcr-modules into module…

f2f6807

…/igv/1.0

Fix typo

e044266

Merge branch 'master' of github.com:LCR-BCCRC/lcr-modules into module…

c1f3806

…/igv/1.0

Remove outdated commented

0a7b028

Add more descriptions to config, reduce timelimit of IGV run

0f21edc

mannycruz commented Mar 27, 2024

View reviewed changes

Update changelog, add empty line to end of script

4e01bac

Kdreval self-requested a review March 28, 2024 17:43

Kdreval self-assigned this Mar 28, 2024

Kdreval reviewed Mar 28, 2024

View reviewed changes

mannycruz added 7 commits April 4, 2024 12:39

Switch to using liftover and add resources option to symlink rule

a821032

Merge branch 'master' of github.com:LCR-BCCRC/lcr-modules into module…

e48e65e

…/igv/1.0

Fix typo in config

7d462f7

Add more information for mutation_id file format

931a8b5

Clean up conda envs

e922df5

Remove scripy that runs crossmap since switched to liftover

2cdf675

Remove unnecessary REGIONS_FORMAT dict

7290191

mannycruz added 4 commits September 1, 2024 15:53

Allow ability to specify version of IGV to download

5b9551c

Fix typo in list of config wildcards so tumour_id matches wildcard in…

f864881

… snakefile

Add ability to specify bam path in config

7df9f19

Fix string indexing for mutation_id formatted files

ef47a17

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Module/igv/1.0 #303

Module/igv/1.0 #303

mannycruz commented Mar 27, 2024

mannycruz Mar 27, 2024

Kdreval Mar 28, 2024

Kdreval left a comment

Kdreval Mar 28, 2024

mannycruz Aug 8, 2024

Kdreval Mar 28, 2024

mannycruz Aug 8, 2024

Kdreval Mar 28, 2024

mannycruz Aug 26, 2024

Kdreval Mar 28, 2024

mannycruz Aug 26, 2024

Kdreval Mar 28, 2024

mannycruz Aug 26, 2024

Kdreval Mar 28, 2024

Kdreval Mar 28, 2024

Kdreval Mar 28, 2024

mannycruz Aug 26, 2024

Kdreval Mar 28, 2024

mannycruz Aug 26, 2024 •

edited

Loading

Kdreval Mar 28, 2024

mannycruz commented Aug 26, 2024

	def get_chain(wildcards):
	if "38" in str({wildcards.genome_build}):
	return reference_files("genomes/{genome_build}/chains/grch38/hg38ToHg19.over.chain")
	else:
	return reference_files("genomes/{genome_build}/chains/grch37/hg19ToHg38.over.chain")


	# Convert the bed file in hg38 coordinates into hg19 coordinates
	rule _run_liftover:
	input:
	native = rules._liftover_convert_2_bed.output.bed,
	chains = get_chain
	output:
	lifted = temp(CFG["dirs"]["liftover"] + "from--{seq_type}--{genome_build}/{tumour_id}--{normal_id}--{pair_status}.{tool}--{type}.lifted_{chain}.bed"),
	unmapped = CFG["dirs"]["liftover"] + "from--{seq_type}--{genome_build}/{tumour_id}--{normal_id}--{pair_status}.{tool}--{type}.lifted_{chain}.unmapped.bed"
	log:
	stderr = CFG["logs"]["liftover"] + "from--{seq_type}--{genome_build}/{tumour_id}--{normal_id}--{pair_status}.{tool}--{type}.lifted_{chain}.stderr.log"
	params:
	mismatch = CFG["options"]["min_mismatch"]
	conda:
	CFG["conda_envs"]["liftover-366"]
	wildcard_constraints:
	chain = "hg38ToHg19\|hg19ToHg38"

Module/igv/1.0 #303

Are you sure you want to change the base?

Module/igv/1.0 #303

Conversation

mannycruz commented Mar 27, 2024

Pull Request Checklists

Checklist for New Module

Required

If applicable

Checklist for Updated Module

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Kdreval left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mannycruz Aug 26, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mannycruz commented Aug 26, 2024

mannycruz Aug 26, 2024 •

edited

Loading